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ABSTRACT 



A dynamic software update facility (DSUF) is installed 
in a data processing system for the purpose of non-dis- 
ruptively replacing old operating system programs or 
modules with new updated versions thereof while pro- 
viding continuous availability and operation of the sys- 
tem. The new versions are loaded into the system along 
with change instructions providing information con- 
trolling the update. Task or process control blocks con- 
tain markers indicating the corresponding tasks are safe 
or unsafe to run the new programs. The markers are set 
initially to unsafe. A change descriptor table is stored 
and contains control information derived from the 
change instructions. When the DSUF is activated, an 
interrupt handler is installed and traps are stored in the 
old programs at entry points and safety points therein. 
Entry point traps are tripped when a task or process 
enters the old program and interrupts are generated that 
are handled by the interrupt handler to route tasks 
which are unsafe to the old program and tasks which 
arc safe to a new program. When all tasks are safe, the 
new programs replace the old programs. When safety 
point traps are tripped, a task or process may change its 
state from unsafe to safe when predetermined condi- 
tions are met. 

19 Qaims, 4 Drawing Sheets 
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METHOD OF OPERATING A DATA PROCESSING 
SYSTEM HAVING A DYNAMIC SOFTWARE 
UPDATE FACILITY 

BACKGROUND OF THE INVENTION 

This invention relates to the field of data processing, 
and, more particularly, to improvements in a method 
for dynamically making software changes in a running 
system. 

There are commercially available data processing 
systems such as IBM ESA/390 data processing systems, 
which operate with many resident programs or modules 
such as those of the commercially available IBM 
MVS/ESA operating system. ("IBM", "ESA/390". 
and "MVS/ESA" are trademarks of International Busi- 
ness Machines Corporation) When a system is running, 
such resident modules are accessible to each other in 
many different ways, and multiple tasks and processes 
can independently access the programs. From time to 20 
time, various operating system modules are updated and 
it becomes necessary to substitute new versions for the 
old versions. The problem thus exists of how to effect 
non-disruptive replacement while the system is running 
and in consideration of the complex environment where 25 
rone or more differeht proce^ are concurrently using 
the programs being f^laced 

The general problem is known and has been recog- 
nized in the prior art. A paper, "Change Programming 
in Distributed System", by G. Etzkom, International 30 
Workshop on Configurable and Distributed Systems, 
pages 140-151. London, UK, Mar. 25-27, 1992, de- 
scribes a method of dynamically reconfiguring pro- 
grams in a system in which the programs communicate 
by message passing between ports. Reconfiguration 35 
occurs only when the system has reached a "reconfigu- 
ration state" and stays in such state while the changes 
are applied or made. The method requires a first series 
of reconfiguration commands that place the system in 
the reconfiguration state and then a series of change 40 
conunands which effect the change. A change is made 
by reconfiguring an old version out of the system and 
configuring a new version into the system. The inven- 
tion differs firom such a system in several ways but the 
major pomts of distinction are as follows. First, the 45 
invention is not based on message passing but upon the 
use of entry points and safety points and the normal 
interaction of processes with the programs to be 
changed. Second, in the invention, both old and new 
programs may be executed concurrently via multitask- 
ing while the system described in the paper completely 
reconfigures an old program out of the way. 

Another paper "Dynamic Program Modification in 
Telecommunication Systems", by O. Frieder et al., 
Proceedings of the IEEE SEVENTH CONFER- 
ENCE ON SOFTWARE ENGINEERING FOR 
TELECOMMUNICATION SWITCHING SYS- 
TEMS, pages 168-172, 1989, proposes a solution for a 
subset of the problems in a distributed telecommunica- 
tions environment TTie^updating process d^^ibed jn 60 
this paperreplaces pro^^ms having plural procedures, 
one^procedurefat a time. '"The updatiig system inter-^ 
n^ts the program and examinesjhc^jurf ent statel^^^^ 
^runtim e stac k;. Based on this information and the list of 
all procedures that each procedure can call (generated 65 
by the language compiler), the updating system calcu- 
lates when each procedure may be updated. Updating a 
procedure involves changing its binding from its cur- 



50 



55 



rent version to the new version. When all procedures 
have been replaced by their new version, the program 
update is complete." ^g. 169) A "procedure . . . that has 
changed between versions may be updated only when it 
is not active." In contrast, the invention updates active 
tasks and uses entry points and safety points, and does 
not examine any stack. The invention allows for con- 
current execution of the old program and the new pro- 
gram by multiple tasks. Also, the invention does not 
require interception of every program exit 

The invention involves the use of "safety points" 
which are system observable events and conditions. 
These events and conditions control the routing of tasks 
to the old program or to the new program. One may 
relate the concept of a safety point in a program to a 
sync point in a database (DB) transaction. All DB 
changes must be permanently written in the data base 
once a sync point is reached, all of them should be 
backed out if the transaction aborts prior to reaching a 
sync point, and in some database managers, none of the 
changes are visible to other transactions until a sync 
point has been reached. The differences between the 
DB sync point and the program change safety points 

are: 

SafetTpbints afecfiosen anew witlPeach change to 
^^the program. Sync points usually remain the same, 

even if the program flow or the database structure 

changes. 

Safety points most often reside in modules which are 
not being changed, while sync points are often 
embedded in the program constituting the transac- 
tion. 

Sync points are either explicit (system call), or trivi- 
ally implicit (end of transaction implies a sync 
point). Safety points are explicit, but cannot be 
observed in the program. They must be specified 
externally to the program. It is not possible to code 
a system call saying 'This task is now Safe for any 
change". 

Safety points lose their meaning when the change is 
fully applied or the system is restarted with all new 
modules. Though the code in and around the safety 
point continues to execute, it bears no further sig- 
nificance as a safety point The sync point is part of 
the ongoing logical significance of iLe program. 
These differences also apply when comparing the 
concept of safety points and the concepts related to 
Checkpoint-Restart, and the points in time when the 
latter can be performed. 

SUMMARY OF THE INVENTION 

The problem of replacing program modules while a 
system is running has been an open issue for many years. 
There are numerous sub problems which make a solu- 
tion dif^cult to implement. This invention describes a 
method that solves most known sub problems. The 
method is applicable to most commercial operating 
systems, mcluding MVS/ESA and UNIX (TM AT&T) 
operating systems. The constraints within which the 
solution must operate, are: 

1. The method should handle arbitrarily unstructured 
code, which may be called concurrently by multi- 
ple processes, using any method of call which is 
physicaUy possible with the underlying machine 
architecture. 

2. The running code (the old version which is being 
changed) should not have required or otherwise 
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undergone a restructure, a rewrite or other modifi- 
cation in order to position it for the dynamic 
change at hand. An "ordinary" change should be 
applied to '^ordinary" and existing code with the 
help of an external facility, and with the help of an 5 
adininistrative process. 
3. Process blocking (quiescing) during implementa- 
tion of a dynamic change must be kept to a mini- 
mum. Deadlocks are prohibited. 
There arc also many problems that need to be solved. 10 
Five problems are discussed next. 

Problem 1: Coordinating Concurrent Changes to 
Multiple Modules— This is the most complex problem 
in managing dynamic changes. It revolves around the 
dependency of multiple processes on different versions 15 
of shared modules. For example, suppose two processes 
PI and P2 call program A and program B occasionally. 
Changes to programs A and B involve loading two 
new, updated programs A' and B' into memory. It is 
possible that process PI is at a point in which all new 20 
calls to A or B can be and should be routed to the new 
versions, namely, A' and B', while process P2 is at a 
point where it is still dependent on the old version of A 
and B, and invoking a new version will cause an error. 

There are two common techniques to resolve this 25 
problem. One technique is to shut down the whole 
system before the change is implemented, and restart it 
after loading the new versions of the programs. The 
shutdown guarantees (in most cases) that both processes 
PI and P2 are at a point where there is no outstanding 30 
dependence on programs A and B. Naturally, this solu- 
tion suffers from the disadvantage that it causes a pro- 
longed system outage which is often undesirable. 

The second technique is to require that each process 
be contained within a **transaction", i.e., a short, inde- 35 
pendent work request. A transaction processing system 
is then replicated by hardware and software redun- 
dancy. Initially, all the transactions are processed by 
one copy of die transaction processing system (TPS) 
while the other copy is idle. The change is implemented 40 
on the idle copy of the TPS which in turns begins to 
process all newly arriving transactions. All new trans- 
actions unconditionally execute the new version of the 
program. Meanwhile, the original copy of the TPS 
continues to execute the old transactions (uncondition- 45 
ally calling only the old version of the program) until 
they are all completed, at which time it becomes idle 
and is then candidate for change. This solution does not 
commonly work in "legacy systems*' which are not 
structured to process in this manner. The need for dy- 50 
nanuc change exists nevertheless in those legacy sys- 
tems and a solution which docs not require a major 
restructure is desired. Also, this solution requires a con- 
siderable amount of redundancy which implies in- 
creased cost. This solution does not address the cases of 55 
long nmning processes (e.g. **batch jobs") which are 
never rerouted to the new system, and may not enjoy 
the benefits of dynamic change. Such long nonning 
processes also may delay indefinitely the time when the 
original system can be changed again. Also, many appli- 60 
cation program changes carry dependencies which 
survive beyond the life of a single transaction. Thus, 
new transactions are not all eligible for executing the 
new code and some may have to still execute the old 
code. 65 

The invention solves this problem by performing 
conditional routing. Whenever a process invokes a pro- 
gram for which more than one version exists, the system 
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routes the call to the version required by this process, 
based on the state of the given process. 

Problem 2: Physical Replacement of Isolated Mod- 
ules, i.e., replacing a single module independently of 
any other change and when no synchronization be- 
tween multiple tasks is required— This problem is 
readily solved in an environment where all entries to 
the module are conveniently intercepted by the system. 
This situation exists when entry to the module is ef- 
fected via some system call (LINK, ATTACH, FORK, 
EXEC, etc.) A straight forward solution is to route all 
calls which are directed at the module, and which were 
issued after the request for change was accepted by the 
system, to a new version of the module. 

However, the problem is substantially more difficult 
if callers are allowed to call the module in ways which 
bypass all operating system services. In addition, forc- 
ing every program invocation to be constantly filtered 
by the operating system, lest there be a change pending, 
may consume substantial processor resources, and is 
considered prohibitive in a system in which there are 
frequently called functions such as those included in 
operating system kernels. 

A second possible solution is to use the machine*s 
ability to create interrupts based on program behavior 
such as program event recording (PER) in the 
ESA/390 system. This mechanism is limited in its abil- 
ity to handle multiple concurrent outstanding changes, 
and imposes a substantial performance penalty on the 
system. 

Another solution which has been suggested, is to 
modify the original program code in memory in a way 
that it will redirect the execution of the program to a 
new version loaded elsewhere in memory. Such inter- 
ception needs to be specific to the module at hand and 
either cause a debugger to be invoked, or force uncon- 
ditional execution of the new version of the code. This 
solution lacks the ability to coordinate between changes 
to multiple modules. In accordance with the inventive 
solution, a combination of comprehensive trapping with 
registration and filtering of safety points is used as de- 
scribed below. 

Problem 3: Surgical Replacement of a Portion of a 
Module (e,g., a Control Sections (CSECT))— The 
problem here is to be able to replace a piece of the 
object code in a running program (usually a CSECT in 
a load module) while resolving all possible references to 
and from this code segment As is well known a 
"CSECT" is an independent segment of a program and 
provides a scope of recognition of names. In an MVS 
environment, replacing a CSECT in the nucleus or 
kernel is a typical example of significant value. One may 
want to consider implementing even more local 
"patches" (part of a CSECT) with the proposed 
method. The solution is to treat each CSECT as a mod- 
ule being replaced in the manner of the invention, 
namely, the CSECT is compiled and linked indepen- 
dently as a separate program. Pointers in the old module 
which point to the old CSECT, are treated in the same 
way as pointers to a module. 

Problem 4: Changing Data Structures— In problem 
1, if the change to programs A and B involves a change 
to the layout or format of a data structure, the prior art 
does not offer any mechanism which enables the change 
to the program to be reflected in a pre-existing data 
structure. The solution provided herein does address 
this problem. If the data structure is private or local to 
the existing process, then by a proper choice of safety 
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point, the data stmcxture can be changed munediately DETAILED DESCRIPTION 
Upon reaching the safety point. A program which ef- 
fects the change in the data structure, is not part of the Referring now to the drawings, and first to FIG. 1, a 
update facility but is part of a change package which data processing system (DPS) 10 comprises a plurality 
also includes change-instructions and updated pro- 5 of central processing units (CPUs) 12 and 14 (also re- 
grams. If the data structure is shared by multiple pro- furred to as CPUl and CPU2) connected to a common 
cesses, the data structure can be changed when all of the memory 16 and to an I/O subsystem 18 by busses 13 and 
processes which share the data structure, have been subsystem 18 is further connected to various 
blocked at safety points. This technique is based on the ^^^^ progr^ ^fl.l^i^J'li:^' 
manner in which coordinated groups are handled in 1° ^^^V ^ l^rge, commercially ^^^^ble \BM Es^^^ 
accordance with the invention. Problem 5. Synchroniz- ^omputmg system diat nms mider IBM MVS/ESA 
ing Multiple Tasks- Problem 1 referred to dependen- operatmg sj^^^d is classed asam^^ 
cies withii a single process. The problem is mailed if Proc^^mg DPS. FIG^l schemaOcally illustrates mem- 
a certain set of nrocesses PI P2 Pn must all start ^ appears when the system is operatmg m the 
a certam set of processes - • ,Pn, must au start described hereinafter. For the purpose of 
calhng the new versions A . B , together, at a synchro- ^^^J^^ invention, assmne that memoi^ 16 stores 
mzed point, while other processes Q1,Q2, . . . Qm, must ^^^^ ^ ^^^^ ^^^^ programs 
Still call the old versions of those programs mdefimtely. ^ ^ ^ 3 respectively, which programs are part of the 
This problem is solved by treatmg the processes as a operating system and are shared by various other pro- 
coordmated group, as descnbed below. ^0 grams and processes being executed in either or both of 

One of the objects of the mvention is to provide an ^PUl and CPU2. The general problem which the in- 
unproved method for non-disruptively installing new vention addresses and solves is how to replace modules 
versions of resident programs m a data processing sys- 22 and 24 with updated modules 23 and 25 respectively 
tem while the system is running. containing new program A' and new program B', while 

A further object of the invention is to non-disrup- 25 the system is running subject to the constraints and 

tively install new versions of operating system modules sub-problems stated previously. The new programs A' 

while the system is running and one or more processes and B' are changed or updated versions of old programs 

are executing which use and access such modules. A and B. Each module may have more than one entry 

Still another object of the invention is to provide a point, 
dynamic update facility that uses the combination of 30 The general operation of DPS 10 will now be de- 
traps and safety points to effect transition between old scribed with reference to both FIGS. 1 and 2. Prior to 
and new versions of operating system programs. execution of PROCESSl and PROCESS2, standard 

Briefly, in accordance with the invention, when a process control blocks (PCBs) 17 and 19 are created by 

new version of a module is installed, every invocation the operating system in memory 16 which blocks con- 

of the old version is intercepted by the system. A dy- 35 tain information about the processes. Such blocks or 

namic software update facility (DSUF), then deter- extensions thereto are modified in accordance with the 

mines the state of the process which invoked the pro- invention to include a selectively settable marker or 

gram. If the process is '^unsafe", the DSUF passes con- flag, or bit for each change indicating whether the cor- 

trol to the old version of the program. If the process is responding process is safe or unsafe to use the updated 

"safe*^ the DSUF passes control to the new version of ^ program(s) being provided with such change. Also, 

the program. When the change is first installed all pro- information about the state of the conditions which 

cesses are initially considered unsafe. The developer of ^ process eUgible to be marked safe, are stored m 

the change, provides along with the new programs the PCB or an extension thereto. These conditions com- 

change-instructions including a set of conditions mider the events and/or states when the process is 

which a process can undergo a state transition from ^5 deemed -safe' . A dynamic software update facihty 

unsafe to safe. The DSUF, upon its initialization sets ^^SUF) 28 is loaded mto memory 16 at system mitial 

itself up to capture all process ^ansitions from an unsafe f^^f ^ ^^'^ selecUvely activated ther^- 

state to a safe state. Thus the DSUF has complete ^^V'^a^y^d B 

knowledge about the state of each process and can ^'pf^^^ acdlation of the DSUF, the new programs 

exercise the conditio^ routmg of control accordmg to ^^^^^^ ^ ^ programmer modifying the old 

the developers specifications. The conditions for state programs, recompiling, and hnking the new programs 

transition are called "safety pomts". j^^^j modules 23 and 25. Such modules are with 

DRAWINGS machine readable change-instructions 27 and 29, for 

55 dynamically installing the new modules and programs. 

Other objects and advantages of the invention will be ^h^ change-instructions identify all entry points in the 

apparent from the foUowing description taken in con- programs and all safety points wherever located, 

nection with the accompanying drawings wherein: g^^h safety points being the events or conditions which 

FIG. 1 is a block diagram of a data processing system make a process eligible for executing the new code. If 

embodying the invention; 50 the change involves a change in a data structure, a 

FIGS. 2A and 2B, when joined at reference line program called data structure change effector (DSCE) 

A — A form a block diagram illustrating the genera] is also packaged to make the change in the data struc- 

operation of the invention; ture. The exemplary change does not include a data 

FIG. 3 is a flow chart of a trap point routine shown in change and thus no DSCE is shown in the drawings. 

FIG. 1; 65 It is believed that a detailed discussion of "safety 

FIG. 4 is a schematic diagram illustrating the tree points*', at this place in the description, will facilitate a 

structure of the change descriptor table shown in FIG. better understanding of the invention. When a change 

1. programmer is developing a change to a system, the 
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progranuner can analyze the change modules to deter- During ACTIVATE phase 38, step 50 enables an 

mine the dependencies of tasks or processes on execu- intercept in a standard program check first level inter- 

tion of the old version and of the new version. During rupt handler (PCFLIH) 26 so that from this point on, 

such analysis, the programmer can determine the condi- DSUF 28 receives control on every program check 

tions which must be satisfied when processes can stop 5 interrupt. Step 52 installs traps 53 and 55 in memory 16 

executing the old program and start executing the new at all entry points in the old programs and at all safety 

program. Often, conditions can be translated into events points in old programs A and B and elsewhere as prede- 

in the life of a task, or the combination of an event with termined by the change programmer and set forth in the 

an observable state of the process. These events, states, change-instructions. Each trap is a hex byte xOO in the 

and associated conditions are deemed to be "safety 10 first byte and may include a second byte that is an ao- 

points*'. A safety point is specified by the change pro- cess index into the hash table in CHDESC 32, described 

grammer designing a change. Examples of safety points below. Alternatively, a trap can comprise a machine 

are when a task is: started after the change was imple- instruction which either causes an interrupt or other- 

mented (that is, all new tasks are "safe"), entering or wise enables the invocation of the DSUF. Step 54 then 

exiting a particular module (either one of the changed 15 saves time stamps of when the traps were stored. The 

modules or another unchanged module), making a par- system is thus initialized and prepared for the run phase 

ticular system call, executing an instruction at a given during which various processes are executed or run in 

offset at a given module, observed as swapped out, CPUs 12 and 14. 

observed as being in a problem or user state (as opposed During RUN phase 40, the system appears as shown 
to a supervisor state), observed as being in a wait state 20 in FIG. 1. When a process being executed by one of the 
awaiting completion of some other task or awaiting CPUs, e.g. PROCESSl, enters program A, an attempt 
some new work to be assigned to it, running tmder a is made to execute the first trap byte xOO. Such code is 
given job name, and not running under a given job an invalid instruction and the attempt to execute it pro- 
name, duces a program check interrupt causing PCFLIH 26 to 
These events and conditions are either observed by 25 be executed in step 56. DSUF 28 receives control from 
the system since they include a system call, or observed PCFLIH, examines the cause of the interrupt, and de- 
by the DSUF which receives control and tests for termines in step 58 whether the trap is a DSUF trap, 
safety conditions before marking a process as safe. Sub- When an interrupt occurs, information is passed indicat- 
scqucntly, any task attempting to execute the old code ing the source of the interrupt and the determination of 
will be routed either to the old code if the process is 30 step 58 looks at such source. If the trap is not a DSUF 
unsafe or to the new code if the process is safe. The trap, control is returned to PCFLIH to continue execu- 
change-instructions specify the conditions which allow tion. If step 58 produces a positive or *YES' result, then 
the DSUF to determine which version of the program trap point routine 30 is executed in step 60 to determine 
can be used by the process. Safety points can be in the from the safety status recorded in the associated PCB 
old program, in the process itself, or in some other 35 whether program A or program A' should be executed, 
program, task or process. The safety points also include and to route or pass control to the appropriate new or 
code, referred to hereinafter as "safety point code" that old program for execution. 

is executed at or near the safety point. Such code in- At a later time, a user can request the change to be 

eludes a safety point trap, a "wait** instruction that committed and thereby initiate COMMIT phase 42, 

places the process in a wait state, etc. 40 Alternatively, the COMMIT phase can be initiated 

DSUF 28 operates in phases, the different phases automatically, e.g. by lapse of a predetermined amount 

being shown in FIG. 2 as labeled boxes located along of time sufficient in duration to reasonably insure that 

the left side of FIG. 2. Different more detailed actions the new programs will work properly. In COMMIT 

which occur during such phases, arc shown as labeled phase 42, step 62 determines if all the processes are safe, 

boxes located along the right side of FIG. 2. The differ- 45 i.e,, have all the processes been marked "safe". If so. 

ent phases include an INSTALL phase 34, a PRE- step 64 switches over to the new programs and step 66 

PARE phase 36, an ACTIVATE phase 38, a RUN ends the COMMIT phase. If any process is not safe, 

phase 40, and a COMMIT phase 42. step 62 bypasses step 64 and the new programs are not 

When DSUF 28 is activated, INSTALL phase 34 committed. The switching over to the new programs is 

performs step 43 to store load modules 23 and 25 and 50 done by locating in each entry point node in CHDESC 

change-instructions 27 and 29, in program Ubrary 20. 32 the arrays of addresses that point to the old code and 

Next, during PREPARE phase 36, step 44 initially then storing in such addresses pointers to the new code, 

marks all processes and tasks as "unsafe**. Such markmg Any task still using a "saved** old address executes 

is done by setting the safety status in the corresponding correctly since the trap remains in place and routine 30 

PCBs 17 and 19. Step 46 then loads copies of the new 55 routes all callers of the old code to the new code. The 

programs A' and B' from the library into memory 16 in changes can be backed out of by a similar process of 

such a manner that the new programs arc initiaDy "hid- applying an ordinary change where old and new ver- 

den" from the rest of the system. That is, no pointer is sions exchange their roles. The advantage over plain 

created allowing direct access to cither program by the removal of the trap is that safety rules can (optionally) 

rest of the system— only DSUF has direct access ini- 60 be applied. For example, if a process is already execut- 

tially. Step 47 analyzes the changes by reading the ing the new code, it can keep doing so but all subse- 

changc-instructions 27 and 29 and step 48 then creates a quent new processes would go back to the old code, 

change descriptor (CHDESQ table 32, in memory 16, Referring to FIG. 3. when trap poiiit routine 30 is 

for storing the change information including the spe- executed, step 68 checks the reason for invocation, i.e., 

cific conditions and events which make each task eligi- 63 whether the reason is because of an entry point trap or 

blc to be "safe**. Tabic 32 is described below relative to a safety point trap. The tripping of a safety point trap 

FIG. 4 and contains information for controlling the indicates a state transition from unsafe to safe has oc- 

update process. curred. If the reason is an entry point trap, step 70 looks 
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at a safety status marker in the PCB of the process that address of hash table 114. Node 92 is the root of a nodal 
tripped the trap, and decides if the state of the process is tree data structure having a plurality of nodes and 
safe or unsafe. Step 72 then routes a safe process to the pointers. If there is no node for a pointer to point to, a 
corresponding new program. If the process is unsafe, null value is set in the pointer field. There is one change 
step 74 routes the process to the old program. Such 5 node for each change. Change node 95 contains a 
routing is schematically shown in FIG. 1 by the OR pointer 96 to a dependent module node 97 and a pointer 
functions between routine 30 and the programs A or A' 127 to a sibling change node 126. There is one module 
and B or B'. When a trap is written or loaded into an node for each module in a change. For example, if two 
entry point, the instruction that was previously there is modules 22 and 24 are being both revised with a single 
overwritten. Since there is a need to be able to produce 10 change, there would be a module node for each. Mod- 
the results of the old code including results of the in- aie node 97 has a pointer 98 to a dependent trap point 
struction that was destroyed, the problem can be over- node 102 and a pointer 99 to a sibling module node 100. 
come in at least three different ways. First, the instruc- jhere is one tr^ point node for each trap point in a 
tion before it is overwritten, is copied into step 74 and is change. Further trap point nodes 106, 110 are siblings of 
executed immediately before a branch instruction 15 node 102 and are pointed to by pointers 104, 108. Addi- 
which branches to the mstruction m the old program ^^^^ ^^y^j^^ ^^^^ ^^^^^ pointed to by pointer 
immediately following the instruction that was de- ^^2. Module node 100 contains a pointer 124 to any 
stroyed. Second, since many entry pointe merely con- f^^^her sibling module nodes and a pointer 125 to the 
tarn branch mstructions for bypassmg definitional infor- ^ ^^^^ ^ f^^er trap point 

mation m the old program, a branch mstruction can be 20 ^^^^ ^ ^^^^ Change node 126 con- 

placed in the step 74 to branchto the target mstruction ^ ^^^^^ to dependent module nodes (not 

m the old pro-am. Third^fr^ copy of the old pro- ^^^^^ ^ ^^^^^ 
gram, called fresh old or FROLD, can be stored before ^^^^ 

the traps are loaded, and the fresh copy executed, m- ^^^^ ^^^^ ^ ^^^^^j^^ ^^5^ ^^^^ 

stead of the old program. 25 ^ . ^ j 5 ^ . ^ ^ ^ 

If a state transition event or safety pomt trap invoked ^ " ' " , r pumi m piu 

routine30,step76decidesif theproc^sthattripped the Each field contains a pomter to a different, sm- 

trap meets the conditions to become safe. If the process fL^''^ P^""' "^^"i ^'^^ P°f ^^^^^es 102 106 110, and 
does met such conditions, step 77 decides if the process J?? r^pectivdy pointed to by pomters 114 118, 
is part of a coordinated group. A coordinated group 30 l?^- A^previously m(^cated, the first byte of a 

might be a parent process and subordinate "child" pro- contains xOO and the second byte contams an access 

cesses, and indications thereof are placed in the associ- particular trap. Each trap 

ated PCBs. If the process is not part of such a group, Pomt node is thus accessible through two paths. One 
step 83 then marks the process as "safe" in the corre- P^^h foUows pomter 93 from system node 92 to table 
sponding PCB. Step 85 then executes the DSCE if one 35 ^he access index in the second byte of a 

exists, and step 87 then returns to the process. On the ^P ^ *o ^ particular field that contains a pointer to 
other hand, if the process is part of a coordmatcd group, particular trap point node. The second path follows 

step 78 marks the process "safe". Step 79 checks to see pointer 94 from system node 92 and additional pointers 
if the process is the last process remaining in the group. change node 95 to get to a particular trap point 

If it is, step 80 executes and DSCE, and step 81 then 40 example, trap point node 110 is accessible by 

unblocks or unsuspends all of the suspended processes pointers 96, 98, 104, and 108. Since the access index is 
and returns to the system. If step 79 results in a negative one byte, only 256 entries can be made directly in table 
determination, step 84 suspends the process and step 86 ^l*- To provide for a larger number of trap points, the 
returns control to the system to execute other processes. indices are assigned on a first come, first served basis 

It should be apparent that because each process is 45 t>ut with wraparound so that there could be plural traps 
initially marked unsafe, until such time that the process associated with each table entry. To accommodate this, 
is marked safe, whenever the process attempts to enter ^ collision chain may be provided for the additional 
the old program, an interrupt is generated which causes entries which chain is accessed through a hash collision 
the process to be routed to the old program. When such ^ ^^^4. The chain links those trap nodes associated 
process subsequently hits a safety point trap, the process 50 with the same hash table entries, 
is then marked safe and subsequent attempts to enter the The various nodes store information not all of which 
old program direct the process to the new program. is required or used by the invention. Bach change node. 
The method prevents any disruption to processes that in addition to the two pointers previously described, 
may be executing the old programs at the time of the stores the safety point type for this change, a time stamp 
change. A very simple example of an update is one 55 of completion of the ACTIVATE phase for this 
where a process or task that starts after the new pro- change, and flags signifying: 
gram has been installed must execute the new programs PREPARE completed: yes/no 
while processes started before such point in time exe- ACTIVATE started: yes/no 
cute the old code. In such example, the entry point of ACTIVATE completed: yes/no 
the new program is defined as the safety point and the 60 COMMIT completed: yes/no 
time stamp of when a trap is installed in step 54 can be Change is ordinary forward change, or it b a backout 
compared to the time when a process starts, to deter- of another change. 

mine if such process is safe or unsafe. Change was backed out by another change. 

As shown in FIO. 4. CHDESC table 32 comprises a Each module node further contains module name, 
system node 92 having a pointer 93 to a trap point hash 65 address where module is stored in memory, total sizes of 
table 114 and a pointer 94 to a change node 95. The FROLD if one is provided and of new module, pointer 
address of system node 92 is stored in memory and to same module node in a previous change which is 
made known to trap processing routine 30, as is the base overridden by this change, and flags indicating whether 
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the areas occupied by the FROLD and new versions for 
this change, have already been freed. 

Each trap point node further contains address of 
where trap is located, expected program status word 
(PSW) at time of trap, FROLD address, new version 5 
address, address of entry point node for this entry point 
in the next active change that overrides this change (if 
this is a 0, then it is still possible that there is a prepared 
change ready to override. This field is stored only dur- 
ing the ACTIVATE phase of the next trap point node), 10 
address of trap point node of this trap point in a previ- 
ous active change, the one that is or will be overridden 
by this change or 0 (this field is stored during the PRE- 
PARE phase of this node), address of trap point routine 
30 for this entry point, addresses of parent change and IS 
module nodes for this module/change instance, array of 
all addresses that need to be fixed during COMMIT, 
indicators indicating if the new code was used at least 
once and the old code was used at least once, hash table 
index for this node, instruction to be executed to set a 20 
call register correctly, work areas to be used for storing 
information overridden during ACTIVATE, and an 
indicator whether above addresses are real or virtual. 

When a safety point trap is tripped, step 76 deter- 
mines if the task or process meets the conditions of 25 
eligibility for becoming safe. This step is accomplished 
by going to the source of the interrupt, namely the 
safety point trap code, and obtaining the hash table 
index for such trap. Using the index along with the base 
address of the hash table, step 76 then accesses the cor- 30 
responding trap entry node and obtains therefrom the 
pointer to the corresponding change node, which con- 
tains information on what conditions establish the safety 
point. Steps 76 then accesses the PCB for the process 
that tripped the trap and determines from such condi- 35 
tions whether the task is eligible to be safe. 

It should be apparent to those skilled in the art that 
many changes can be made in the details and arrange- 
ments of steps and parts without departing from the 
scope of the invention as defined in the appended 40 
claims. 

What is claimed is: 

1. The method of dynamically updating an old oper- 
ating system program (hereinafter "old program") 
stored in a main memory of a data processing system 45 
(DPS) while said DPS is running and executing at least 
one task that accesses said old program from time to 
time in a multitasking mode, said method comprising; 

(A) storing in said memory a new program that is an 
updated version of said old program; 50 

(B) estabhshing an executable safety point in said 
DPS which produces a machine observable safety 
point condition; 

(C) storing in said memory a selectively settable first 
marker for indicating whether said one task is safe 55 
or unsafe for executing said new program, said first 
marker being initially set to indicate said one task is 
unsafe; 

(D) executing said one task and entering said old 
program through an entry point therein; 60 

(E) in response to entering said entry point, examin- 
ing said first marker and in response to noting said 
one task is unsafe, passing control to said old pro- 
gram for execution thereof: 

(F) executing said safety point to produce said safety 65 
point condition; 

(G) in response to observing said safety point condi- 
tion, setting said first marker to safe; 
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and (H) after said marker has been set to safe, execut- 
mg said new program each time said first task en- 
ters said old program. 

2. The method in accordance with claim 1 compris- 
ing: 

(I) installing an entry point trap in said old program at 
said entry point before performing step (D); 

step (D) comprises tripping said entry point trap; 

and step (E) is performed in response to tripping said 
entry point trap, 

3. The method in accordance with claim 2 wherein 
said entry point trap is effective when tripped to gener- 
ate a first interrupt, and said method further comprises 

(J) installing an interrupt handler in said DPS for 
intercepting said first interrupt and branching to 
step (E). 

4. The method in accordance with claim 3 wherein: 
said safety point comprises a safety point trap which 

when executed in step (F) generates an interrupt of 
the same type as said first interrupt generated by 
said entry point trap; 
and step (J) further comprises intercepting said sec- 
ond interrupt and branching to step (G). 

5. The method in accordance with claim 4 wherein 
said step (E) is done by first deciding that said first 

interrupt is generated by said entry point trap, 
before examining said marker and passing control; 
and step (G) is done by first deciding said second 
interrupt is generated by said safety point trap, 
before setting said marker to safe. 

6. The method in accordance with claim 3 wherein 
said entry point trap comprises an invalid op code effec- 
tive when executed to generate a program check inter- 
rupt, and said interrupt handler comprises a program 
check first level interrupt handler having an intercept 
for said program check interrupt. 

7. The method in accordance with claim 1 wherein a 
second task is being executed which also attempts to 
access said old program, and said method further com- 
prises: 

storing in said memory a second marker which is 
selectively settable to alternatively indicate the 
status of said second task is safe or unsafe for exe- 
cuting said new program, said second marker being 
initially set to indicate said second task is unsafe; 

and thereafter performing for said second task steps 
similar to steps (D) through (H) whereby both said 
second task there^er causes said new program to 
be executed. 

8. The method in accordance with claim 7 wherein 
said first task and said second task form a coordinated 
group wherein said first task is suspended when said 
first marker is set to safe, until said second marker is set 
to safe whereupon said first task is unsuspended. 

9. The method in accordance with claim 1 wherein 
said one task comprises executable code and a control 
block stored in said memory for storing information 
controlling operation of said one task, said first marker 
being located in said control block. 

10. The method in accordance with claim 1 wherein 
said safety point comprises a safety point trap which is 
executed in step (F). 

11. The method of dynamically updating an old oper- 
ating system program (hereinafter "old program") 
stored in a main memory of a data processing system 
(DPS) while said DPS is running and executing at least 
one task that accesses said old program from time to 
time, said method comprising: 
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(A) installing in said DPS a selectively activated 
dynamic software update facility (DSUF) having a 
trap processing routine; 

(B) storing in said memory prior to activation of said 
DSUF a new program and a status marker for said 
one task, said new program being an updated ver- 
sion of said old program, each status marker being 
selectively settable to indicate that said one task is 
safe or unsafe to execute said new program, said 
status marker being initially set to unsafe; 

(C) activating said DSUF by 

(CI) storing an entry point trap at each entry point 
into said old program, each entry point trap 
generating an entry point interrupt when such 
each trap is tripped, 
(C2) estabhshing a safety point in said DPS which 
safety pomt includes safety point code effective 
when executed to signify said one task is eligible 
to be marked safe and execute said new program. 
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such task to become safe, and to set said status 
marker to safe when said task meets said predeter- 
mined conditions. 

15. The method in accordance with claim 14 wherein 
a safety point is selected from a group of conditions 
comprising: 

when a task is started, upon a task entering a particu- 
lar module, upon a task exiting a particular module, 
when a task loakes a particular system call, if a task 
executes an instruction at a given offset at a given 
modiile, when a task is swapped out, when a task is 
observed as being in a problem state, when a task is 
observed as being in a wait state awaitmg comple- 
tion of some other task, when a task is awaiting 
new work to be assigned to it, when a task is run- 
ning under a given job name, and when a task is 
running other than under a given job name. 

16. The method in accordance with claim 11 wherein 
said system includes at least one additional task which 



and (C3) installing in said DPS an interrupt handler 20 accesses said old program, said DPS operating in a 



30 



35 



for intercepting said entry point interrupt and 
executing said trap processing routine; 

(D) , after activation of said DSUF. running said DPS 
to execute said one task and, in response to said one 
task entering said old program, performing steps 25 
comprising 

(Dl) tripping said entry point trap to generate said 
entry point interrupt, 

and (D2) executing said trap processing routine in 
response to said entry point interrupt from step 
(Dl) to examine said status marker and in re- 
sponse to noting said status is unsafe, branching 
to said old program for execution thereof; 

(E) further operating said system until said safety 
point code is executed and, in response thereto, 
setting said status marker safe; 

and (F) thereafter executing said new program each 
time said one task enters said old program and trips 
one of said entry point traps. 

12. The method in accordance with claim 11 wherein: 40 
said safety point code is a trap operative when tripped 

to generate a safety point interrupt; 
and step (E) comprises executing said trap to generate 
said s^ety point interrupt and branching to said 
trap processing routine, said trap processing rou- 45 
tine being operative to set said status marker safe, 

13. The method of claim 12 wherein said safety point 
trap and said entry point trap generate the same type of 
interrupt, and said trap processing routine performs the 
steps comprising: 50 

examining the source of each interrupt and deciding 
whether such interrupt is a safety point interrupt or 
an entry point interrupt, and branching to a routing 
routine when said interrupt is an entry point inter- 
rupt and to a safety point processing routine when 55 
said interrupt is a safety point interrupt; 

and executing said routing routine to branch to said 
old program when said task is unsafe and to said 
new program when said task is safe. 

14. The method in accordance with claim 13 compris- 60 
ing: 

executing said safety point processing routine to de- 
cide if said task meets predetermined conditions for 



multitasking mode, and said method further comprises; 
storing in said memory a second status marker for 
said additional task, said second status marker 
being selectively settable to iadicate that said addi- 
tional task is safe or unsafe to execute said new 
program, said second status marker being initially 
set or unsafe; 
establishing a second safety point in said DPS which 
includes second safety point code effective when 
executed to signify said additional task is safe to 
execute said new program; 
and thereafter performing in steps (D) through (F) in 
such a manner as to performs steps for said addi- 
tional task similar to those performed for said one 
task whereby said additional task executes said new 
program each time it enters said old programs and 
trips an entry point trap therein. 

17. The method in accordance with claim 16 wherein 
said one task and said additional task are coordinated, 
and said method comprises: 

suspending execution of said one task after it becomes 
safe, until said second safety point is executed and 
said additional task becomes safe; 

setting said second status marker to safe; 

and then unsuspending said one task whereby each 
task can then execute said new program. 

18. The method in,accordance with claim 17 wherein 
said DPS is a multiprocessing system comprising a plu- 
rality of processors for executing said tasks, said mem- 
ory and programs stored therein being shared by tasks 
executing in said processors. 

19. The method in accordance with claim 16 wherein 
said memory further stores a data structure which is 
accessed by one or more of said tasks, and said method 
further comprises: 

storing in said memory a data structure change effec- 
tor program for changing data in said data struc- 
ture; 

and, in response to said one or more tasks being 
marked safe, executing said data structure change 
effector program to change the data in said data 
structure. 

* * * * * 
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